On the Linear Number of Matching Substrings

نویسنده

  • Yo-Sub Han
چکیده

We study the number of matching substrings in the pattern matching problem. In general, there can be a quadratic number of matching substrings in the size of a given text. The linearizing restriction enables to find at most a linear number of matching substrings. We first explore two well-known linearizing restriction rules, the longest-match rule and the shortest-match substring search rule, and show that both rules give the same result when a pattern is an infix-free set even though they have different semantics. Then, we introduce a new linearizing restriction, the leftmost nonoverlapping match rule that is suitable for find-and-replace operations in text searching, and propose an efficient algorithm for the new rule when a pattern is described by a regular expression. We also examine the problem of obtaining the maximal number of non-overlapping matching substrings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Prefix-Free Regular-Expression Matching

We revisit the regular-expression matching problem with respect to prefix-freeness of the pattern. It is known that a prefix-free pattern gives only a linear number of matching substrings in the size of an input text. We improve the previous algorithm and suggest an efficient algorithm that finds all pairs (start, end) of start and end positions of all matching substrings with a single scan of ...

متن کامل

On a Parallel-Algorithms Method for String Matching Problems

Suux trees are the main data-structure in string matching algorithmics. There are several serial algorithms for suux tree construction which run in linear time, but the number of operations in the only parallel algorithm available, due to Apostolico, Iliopoulos, Landau, Schieber and Vishkin, is proportional to n log n. The algorithm is based on labeling substrings, similar to a classical serial...

متن کامل

A New Linearizing Restriction in the Pattern Matching Problem

In the pattern matching problem, there can be a quadratic number of matching substrings in the size of a given text. The linearizing restriction finds, at most, a linear number of matching substrings. We first explore two well-known linearizing restriction rules, the longestmatch rule and the shortest-match substring search rule, and show that both rules give the same result when a pattern is a...

متن کامل

Efficient algorithms for the longest common subsequence in $k$-length substrings

Finding the longest common subsequence in k-length substrings (LCSk) is a recently proposed problem motivated by computational biology. This is a generalization of the well-known LCS problem in which matching symbols from two sequences A and B are replaced with matching non-overlapping substrings of length k from A and B. We propose several algorithms for LCSk, being non-trivial incarnations of...

متن کامل

Prefix-Free Regular-Expression Matching

We explore the regular-expression matching problem with respect to prefix-freeness of the pattern. We show that the prefix-free regular expression gives only linear number of matching substrings in the size of a given text. Based on this observation, we propose an efficient algorithm for the prefix-free regular-expression matching problem. Furthermore, we suggest an algorithm to determine wheth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. UCS

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2010